---
title: Apache Spark API for Scoring Code
description: Learn how to use the Spark API for Scoring Code, a library that integrates Scoring Code JARs into Spark clusters.

---

# Apache Spark API for Scoring Code {: #apache-spark-api-for-scoring-code }

The Spark API for Scoring Code library integrates DataRobot Scoring Code JARs into Spark clusters. It is available as a [PySpark API](#pyspark-api) and a [Spark Scala API](#spark-scala-api).

In previous versions, the Spark API for Scoring Code consisted of multiple libraries, each supporting a specific Spark version. Now, one library supports all supported Spark versions. The following Spark versions support this feature:

* Spark 2.4.1 or greater
* Spark 3.x

!!! important
     Spark must be compiled for Scala 2.12.

For a list of the deprecated, Spark version-specific libraries, see the [Deprecated Spark libraries](#deprecated-spark-libraries) section.

## PySpark API {: #pyspark-api }

The PySpark API for Scoring Code is included in the [`datarobot-predict`](https://pypi.org/project/datarobot-predict/){ target=_blank } Python package, released on PyPI. The PyPI project description contains documentation and usage examples.

## Spark Scala API {: #spark-scala-api }

The Spark Scala API for Scoring Code is published on Maven as [`scoring-code-spark-api`](https://central.sonatype.com/artifact/com.datarobot/scoring-code-spark-api){ target=_blank }. For more information, see the [API reference documentation](https://javadoc.io/doc/com.datarobot/scoring-code-spark-api_3.0.0/latest/com/datarobot/prediction/spark30/Predictors$.html){ target=_blank }.

Before using the Spark API, you must add it to the Spark classpath. For `spark-shell`, use the `--packages` parameter to load the dependencies directly from Maven:

``` sh
spark-shell --conf "spark.driver.memory=2g" \
     --packages com.datarobot:scoring-code-spark-api:VERSION \
     --jars model.jar
```

### Score a CSV file {: #score-a-csv-file }

The following example illustrates how you can load a CSV file into a Spark DataFrame and score it:

``` scala
import com.datarobot.prediction.sparkapi.Predictors

val inputDf = spark.read.option("header", true).csv("input_data.csv")

val model = Predictors.getPredictor()
val output = model.transform(inputDf)

output.show()
```

### Load models at runtime {: #load-models-at-runtime }

The following examples illustrate how you can load a model's JAR file at runtime instead of using the spark-shell `--jars` parameter:

=== "From DataRobot"

     Define the `PROJECT_ID`, the `MODEL_ID`, and your `API_TOKEN`.

     ``` scala
     val model = Predictors.getPredictorFromServer(
          "https://app.datarobot.com/projects/PROJECT_ID/models/MODEL_ID/blueprint","API_TOKEN")
     ```

=== "From HDFS filesystem"

     Define the path to the model JAR file and the `MODEL_ID`.

     ``` scala
     val model = Predictors.getPredictorFromHdfs("path/to/model.jar", spark, "MODEL_ID")
     ```

### Time series scoring {: #time-series-scoring }

The following examples illustrate how you can perform time series scoring with the `transform` method, just as you would with non-time series scoring. In addition, you can customize the time series parameters with the `TimeSeriesOptions` builder.

If you don't provide additional arguments for a time series model through the `TimeSeriesOptions` builder, the `transform` method returns forecast point predictions for an auto-detected forecast point:

``` scala
val model = Predictors.getPredictor()
val forecastPointPredictions = model.transform(timeSeriesDf)
```

To define a forecast point, you can use the `buildSingleForecastPointRequest()` builder method:

``` scala
import com.datarobot.prediction.TimeSeriesOptions

val tsOptions = new TimeSeriesOptions.Builder().buildSingleForecastPointRequest("2010-12-05")
val model = Predictors.getPredictor(modelId, tsOptions)
val output = model.transform(inputDf)
```

To return historical predictions, you can define a start date and end date through the `buildForecastPointRequest()` builder method:

``` scala
val tsOptions = new TimeSeriesOptions.Builder().buildForecastDateRangeRequest("2010-12-05", "2011-01-02")
```

For a complete reference, see [TimeSeriesOptions javadoc](https://javadoc.io/doc/com.datarobot/datarobot-prediction/latest/com/datarobot/prediction/TimeSeriesOptions.Builder.html).


## Deprecated Spark libraries {: #deprecated-spark-libraries }

Support for Spark versions earlier than 2.4.1 or Spark compiled for Scala earlier than 2.12 is deprecated. If necessary, you can access deprecated libraries published on Maven Central; however, they will not receive any further updates. 

The following libraries are deprecated:

| Name                                                                                                             | Spark version | Scala version |
|------------------------------------------------------------------------------------------------------------------|---------------|---------------|
| [scoring-code-spark-api_1.6.0](https://central.sonatype.com/artifact/com.datarobot/scoring-code-spark-api_1.6.0) | 1.6.0         | 2.10          |
| [scoring-code-spark-api_2.4.3](https://central.sonatype.com/artifact/com.datarobot/scoring-code-spark-api_2.4.3) | 2.4.3         | 2.11          |
| [scoring-code-spark-api_3.0.0](https://central.sonatype.com/artifact/com.datarobot/scoring-code-spark-api_3.0.0) | 3.0.0         | 2.12          |